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Abstract 


We present a new procedure for inferring the structure of a finite-state automaton (FSA) 
from its input/output behavior, using access to the automaton to perform experiments. 

Our procedure uses a new representation for FSA’s, based on the notion of equivalence 
between tests. We call the number of such equivalence classes the diversity of the automaton; 
the diversity may be as small as the logarithm of the number of states of the automaton. For 
the special class of permutation automata, we show that our inference procedure runs in time 
polynomial in the diversity and log(+), where ¢€ is a given upper bound on the probability 
that our procedure returns an incorrect result. (Since our procedure uses randomization to 
perform experiments, there is a certain controllable chance that it will return an erroneous 
result.) We also discuss techniques for handling more general automata. 

We present evidence for the practical efficiency of our approach. For example, our 
procedure is able to infer the structure of an automaton based on Rubik’s Cube (which 
has approximately 101° states) in about 2 minutes on a DEC MicroVax. This automaton 
is many orders of magnitude larger than possible with previous techniques, which would 
require time proportional at least to the number of global states. (Note that in this example, 
only a small fraction (107!*) of the global states were even visited.) 

Finally, we present a new procedure for inferring automata of a special type in which the 
global state is composed of a vector of binary local state variables, all of which are observable 
(or visible) to the experimenter. Our inference procedure runs provably in time polynomial 
in the size of this vector (which happens to be the diversity of the automaton), even though 
the global state space may be exponentially larger. The procedure plans and executes 
experiments on the unknown automaton; we show that the number of input symbols given 
to the automaton during this process is (to within a constant factor) the best possible. 

Portions of this thesis are joint work with Ronald Rivest. 
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Chapter 1 


Introduction 


We address the problem of inferring a description of a deterministic finite-state automaton 
from its input/output behavior. 

Our motivation is the “artificial intelligence” problem of identifying an environment 
by experimentation. We imagine a robot wandering around in an unknown environment, 
whose characteristics must be discovered. Such an environment need not be deterministic, 
or even finite-state, so the approach suggested here is only a beginning on the more general 
problem. 

In line with our motivation, our inference procedures experiment with the automaton 
to gather information. 

A unique and valuable feature of our procedures is that they do not need to have the 
automaton “reset” to some start state or “backed-up” to a previous state; the data-gathering 
is one continuous experiment (as in real life). 

Our procedures are practical; their time and memory requirements are quite reasonable. 
For example, our procedures do not need to store the entire observed input/output history. 

In Chapters 2 and 3, we present a new representation of finite automata based on 
the notion of test equivalence. We present and prove the effectiveness of a probabilistic 
algorithm for inferring permutation automata. We also discuss possible techniques for 
handling more general automata, and give some preliminary experimental results. 

In Chapter 4, we extend the work of the preceding chapters focusing on one aspect of 
the inference problem, namely, that of planning experiments for gathering information. 


Portions of this thesis were previously described in [14,15]. 


1.1 Previous Work 


For a fascinating discussion of the problem of inferring an environment from experience, 
the reader is encouraged to read Drescher [5], whose approach is based on the principles of 
Piaget. 

Kohavi [12] gives a fine introduction to the theory of finite-state automata, as do Hart- 


manis and Stearns [11]. An excellent overview of the entire field of inductive inference is 


given by Angluin and Smith [4]. 

The problem of inferring a finite-state automaton from its input/output behavior has 
a long history. In [9], Gold presents a number of recursion theoretic results concerning 
several language classes, including the regular languages. Gold looks first at the problem of 
identifying a language when given a particular presentation of the language. In one case, the 
learner is provided with an infinite stream in arbitrary order of all strings in the language. 
In the second case, the learner is given an infinite stream of all finite strings generated by 
the alphabet, each string labeled as to whether it is or is not an element of the language. 
In his model, the learner guesses the identity of the language after each example, and is 
said to learn the language “in the limit” if, after a finite number of examples, the learner’s 
guesses converge to the right answer and the learner never again changes its guess. Gold 
shows regular languages can be learned in the limit in the second case described above, but 
not in the first. 

In the same paper, Gold describes the problem of “black box” identification, closely 
related to the particular problem that we are here addressing. In this situation, the learner 
is able to experiment with an unknown black box. At each time step, the learner supplies 
the black box with one of a finite number of input symbols and the black box in turn 
outputs an output symbol calculated as a function of the input symbols provided to it up 
to that point in time. Gold shows that if the black box is a finite automaton, then it can be 
identified in the limit. Note that Gold’s results were recursion theoretic and did not address 
the time complexity of any of these problems. 

In [10], Gold examines more closely the problem of inferring a black box finite automa- 
ton. In this paper, Gold assumes that the experimenter has available to it a means of 
resetting the automaton to some initial state. He describes how the automaton can be 
identified in the limit, how experiments can be efficiently planned, and how the automaton 
can be identified in a finite amount of time if the learner is given beforehand the number 
of states of the automaton. 

Angluin [2] elaborates this algorithm to show how to infer an automaton with active 
experimentation. In her model, the learner has a “minimally adequate teacher” who can 
answer two kinds of queries: First, the teacher will tell the learner whether any particular 
string is a member of the unknown language. Second, the teacher is able to supply the 
learner with a counterexample to an incorrect conjecture of the automaton’s identity. An- 
gluin shows that the number of queries required by her algorithm to correctly identify the 
unknown automaton is only polynomial in the number of states of the automaton. 

Angluin [3] and Gold [8] prove that finding an automaton of n states or less agreeing 


with a given sample of input/output pairs is NP-complete. Note that in this situation the 


inference algorithm does not have access to the automaton—the input/output pairs are 
given and the learner is not able to experiment with the automaton it is trying to identify. 

Finally, Angluin [1] shows how to infer in polynomial time a special-class of finite-state 
automata, called “k-reversible” automata, from a sample of input/output behavior. Later, 
we will give special consideration to the class of permutation automata of which the zero- 


reversible automata are a subclass. 


Chapter 2 
A New Representation of Finite Automata 


2.1 Automata and Environments 


Our definition of a finite-state automaton is a generalization of the usual Moore automa- 
ton. [12]. (Our approach generalizes to handle Mealy automata; however, we find Moore 


automata more natural.) 


Definition 1 A finite-state automaton € is a 6-tuple (Q, B, P, qo, 6,7) where 


e Q is a finite nonempty set of states. 

e B isa finite nonempty set of input symbols, also called basic actions. 

e P is a finite nonempty set of predicate symbols, also called sensations. 
® go, a member of Q, is the initial state. 

e 6 is a function from Q x B into Q; 6 is called the next-state function. 


e y is a function from Q x P into {true, false}. 


When P only contains a single predicate (e.g. accept), we have the standard definition 
of a Moore automaton. We allow multiple predicates to correspond to the notion of a robot 
having multiple sensations in a given state of the environment. 

We assume henceforth that we are dealing with a particular finite-state automaton 
E€ = (Q, B, P, qo, 6, 7), which we call the environment of the learning procedure. 

We say that € is a permutation environment if for each 6 € B, the function 6(-,b) is a 
permutation of Q. 

We let A = B* denote the set of all sequences of zero or more basic actions in the 
environment €; A is the set of actions possible in the environment €, including the null 
action A. 

If g is a state in Q, and a = b;b2...6, is an action in A, we let ga = gb,b2...b, denote 


the state resulting from applying action a to state gq: 


ga = 6(...6(8(q, b), b2)..., bn). (2.1) 


(The basic actions are performed in the order 6, 62,...,6,.) Similarly, if g is a state and p 
is a predicate, we let gp = 7(q, p) denote the result of applying predicate p to state q. 
We say that € is strongly connected if 


(Vq € Q)(Vr € Q)(3a € A)ga =r. (2.2) 


We do not assume that € is strongly connected in our general ducastion of automata and 
diversity. 

However, when we describe our inference procedure, we will make this assumption with 
little loss of generality: If € is not strongly connected, then an experimenting inference 
procedure, having no “reset” operation, will sooner or later fall into a strongly connected 
component of the state space from which it cannot escape, and so will have to be content 


thereafter learning only about that component. 


2.2 Tests 


A test is an element of AP, that is, an action followed by a predicate. We let T denote the 
set of tests AP. We say that a test t = ap succeeds at state q if gt = q(ap) = gap = (qa)p 
is true. Otherwise we say that t fails at q. The length |t| of a test t is the number of basic 
actions and predicates it contains. 

We say that € is reduced if every pair of states can be distinguished by executing some 


test: 
(VgE Q)(Vr € Q)\(qg Ar => (At € T)gt # rt) (2.3) 


We assume henceforth that € is reduced. 

We say that a robot has a perfect model of its environment if it can predict perfectly 
what sensations would result from any desired sequence of basic actions, that is, if it knows 
the value of every test in the current state. The goal of our inference procedures is to build 


a perfect model of the given environment. 


2.3 Equivalence of Tests and Diversity 


A central notion in our development is that of test equivalence. 


We say that tests t; and t2 are equivalent, written t; = to, if 


(Vq € Q)(qti = qtz); (2.4) 


that is, from any state the two tests yield the same result. 
The equivalence relation on tests partitions the set T of tests into equivalence classes. 


The equivalence class containing a test t will be denoted [¢]. 


The diversity of the environment £, denoted D(€), is the number of equivalence classes 
of tests of &: 
D(E€) = |{[t] |¢ € T}F. (2.5) 


The following theorem demonstrates that the diversity of a finite-state automaton is 


always finite, but is only loosely related to the size (i.e. number of states) of the automaton. 
Theorem 1 For any reduced finite-state automaton E = (Q, B,P,qo,6,7),; 
Ig(IQl) < Dé) < 21. 


Proof: The first inequality lg(|Q|) < D(E€), or equivalently |Q| < 2P(€) holds because a 
state is uniquely identified by the set of (equivalence classes of ) tests which are true at that 
state, since € is reduced. The second inequality holds because the equivalence class that a 


test belongs to is uniquely defined by the set of states at which that test succeeds. ll 
Theorem 2 The lower and upper bounds on D(E) given in Theorem 1 are best possible. 


Proof: For the lower bound, consider an environment where the states are n-bit words, 
and, for 1 < i < n, there is a predicate p; which tests whether the i-th bit is one. The 
set B consists of a single action, which is the identity operation (no state change). Then 
D(E) = n but |Q| = 2”. Note that the state space in this example is unconnected. 

For the upper bound, consider an automaton whose states are represented by an element 
x which is either an n-bit vector (z1,...,Z,) or the special value hit; there are 1+2” states. 


The only predicate tests whether x = hit. The following actions are available: 


e For each 7 € {1,...,n}, an action which flips z; if x # hit, and leaves x alone 


otherwise. 
e An action which sets x to hit if x is the all-zero vector 0", and leaves x alone otherwise. 


Using these actions, for any subset X of the n-bit vectors, it is possible to construct a 
test which is true if and only if the initial state begins with x € X or x = hit initially. 
(Selective complementation can bring x into the all-zero state iff it was originally in some 
particular n-bit state y; this state can then be transformed to hit, otherwise the original 
state can be restored by undoing the selective complementation. This can be repeated for 
each y € X.) Actually, this environment only comes within a factor of two of the upper 
bound; its diversity is 2!@I-1, 

However, the following alternative environment does achieve the upper bound, although 


its set of basic actions is enormous. The environment consists of n states numbered 0 


10 


through n — 1, and has a single predicate p which succeeds only at state 0. For each subset 
X of the states, there is an action by which moves state x to state 0 if  € X, or to 
state 1 otherwise. Thus, the test bxp is true iff we are in one of the states in X. Hence, 
D(€) =2'¢|. m 

We propose that the notion of diversity is more suitable than that of size for many 
natural applications. To support this viewpoint, we will demonstrate that there ezists a 
natural encoding of a finite-state automaton, whose size is polynomial in the diversity of the 
automaton. Furthermore, it is straightforward to use this representation to simulate the 


behavior of the automaton. 


2.4 The Update Graph 


As a convenient means of representing the test classes, we may build a directed graph in 
which each vertex is an equivalence class, and an edge labeled 6 € B is directed from test 
class [t] to [t’] iff t = bt’. We call this the update graph of the environment. 

Since there is one vertex for each equivalence class, the size of the update graph is 
precisely the diversity of €. Note that, for 6 € B, every vertex has exactly one b-edge 
directed into it, since if ¢ = ¢’ then bt = dt’. 

Also, for any test t = ap where p is a predicate and a = b,b2...6,y, is a sequence of basic 
actions, there is a path in the update graph along which vertex [p] can be reached from [¢] 
by following the edges labeled 6,,62,...6n. Put another way, we can find t’s equivalence 
class in the update graph by tracing backwards from [p] along the unique path b,,..., 61. 

We associate with each vertex [t] the value of ¢ at the current state q. (This value is 
well defined since if t = ¢’ then by definition of equivalence, gt = qt’.) When action b is 
executed, the test [t’] gets its value from [t], where ¢ = bt’, yielding the new value of each 
test in state gb. Thus, the update graph may be used to simulate the automaton, as we 


prove in the following theorem. 


2.5 The Simulation Theorem 


Theorem 3 To simulate € it suffices to know: 


1. The update graph. 


2. For each equivalence class |t], the value qt at the current state q. 


Proof: Suppose the automaton moves from state g to state gb, for some b € B. We need 


to compute (qb)t = q(bt) for each equivalence class [t]. However, the test bt belongs to that 
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(unique) equivalence class [s] for which an edge labeled 6 is directed from [s] to [t] in the 


update graph. By assumption, we know qs; this is the desired value of (qb)t. 


2.6 Simple Assignment Automata 


We may regard the test equivalence classes as (local) state variables each of which is updated 
under the execution of some basic action with the value of one other (or the same) variable. 
We call such a structure a simple assignment automaton (SAA). The output of an SAA 
consists of the current values of one or more its variables—in this case the equivalence 
classes of the predicates. 

If we regard the current state of an SAA as the assignment of values to all the variables, 
then it is clear that every SAA is deterministic and finite state, and so can be simulated by 
some FSA. Conversely, our construction and the simulation theorem show that every FSA 
can be simulated by some SAA (the one we have constructed is the smallest such SAA). 


Thus, we have proved: 


Theorem 4 Every SAA can be simulated by an FSA, and every FSA can be simulated by 
an SAA. 


2.7 Characterizing Diversity and the Update Graph 


Neal Young and Dana Angluin have pointed out the following relationship between the 
update graph of an environment with a single predicate, and the original automaton: 

Let € be an environment with a single predicate, (Q, B,{p},q0,6,7), and let €’ = 
(Q’, B, {p'}, 9, 6’, y’) be defined as follows: 


e Q'= {[i] |t eT} 

© 9 = Ip] 

e 6’((t], b) = [b¢], for [t] EC Q’,6E B 
¢ 7'([t], p) = got, for [t] € Q’. 


In this construction, Q’ is just the vertex set of €’s update graph so that |Q’| = D(E€). 
Furthermore, by the definition of 6’, we see that the transition graph of €’ is exactly this 


update graph with all of the edges reversed in direction. 


Theorem 5 Let € and €’ be as described above. Then for any action a € A, qgap = qya®p! 


where a® is the reverse of a. 
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Proof: Let a = b;...b,, where each 6; € B. Then by the definition of 6’, we have: 
qga® = [p]bybn—1bn—2-.-b1 

[onplOn—1bn—2-.. 61 

[bn—1onplbp_2-..d4 


[by eee bpp} 
= [ap]. 


Thus, qa"p! = 7'(qa", p’) = 7'([ap], p’) = qoap. Hl 

The language L(E) accepted by automaton € is the set of actions a € A which move € 
from its starting state to an “accepting” state in which the the environment’s only predicate 
is true. That is, L(€) = {a € A| qoap = true}. Theorem 5 shows that the diversity of € 
is exactly the state size of the minimum FSA which accepts the reverse of L(€). 

When E = (Q, B, {p}, ¢0, 6, 7) is a permutation environment with a single predicate, the 
diversity and update graph can be characterized in a different manner. In this case, the set 
of basic actions generates a permutation group G on the states of €. Let H be the subgroup 
of G which stabilizes the accepting states of €. That is, H consists of those group elements 
a of G such that gp = gap for all g € Q. (Equivalently, G is the permutation group on the 
test equivalence classes of €, and H is the subgroup of G which stabilizes [p].) 

We define the left coset graph of H as follows: The vertices of the graph are the left 
cosets of H, and an edge labeled 0 is directed from aH to a’H iff aH = ba’H. 

Then the following theorem shows that the diversity of € is exactly the index of H in 
G: 


Theorem 6 The update graph of E is isomorphic to the left coset graph of H. 


Proof: For any two tests rp and yp, we have: 


y ‘ap=p 


(Vq € Q)qy~ ‘xp = ap 
ylzeH 


tp= yp 


rey 
cH = yH. 


¢ ~t? ¢ t 


The generalization of both these characterizations to environments with multiple pred- 


icates is straightforward. 
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Figure 2.1: The 5 x 5 Grid World 


2.8 Two Example Environments 


The motivation for the introduction of the notion of diversity was the realization that many 
interesting “robot environments” can be modeled as finite automata which, although they 
have a large number of states, have low diversity. In this section, we make this point explicit 


by describing two particular small “robot environments”. 


2.8.1 nxn Grid World 


Consider a robot on an n X n square grid (with “wraparound”, so that it is topologically a 
torus). See Figure 2.1. The robot is on one of the squares and is facing in one of the four 
possible directions. Each square is either red, green, or blue. The robot can sense the color 
of the square it is facing. (This corresponds to the predicates of our previous development.) 

The following actions are available to the robot: It can paint the square it faces red, 
green, or blue. The robot can turn left or right by 90 degrees, or step forward one square in 
the direction it is facing. Stepping ahead has the curious side effect of causing the square 
it previously occupied to be painted the color of the square it has just moved to, so moving 
around causes the coloring to get scrambled up. 

This environment is a finite-state automaton which, even after reducing by factoring 
out some obvious symmetries, has an exponentially large (3°-1) number of states. 

However, the diversity of this environment is only O(n”). The state of this environment 
is completely characterized by knowing the color of each square (using a robot-relative 


coordinate system). It is not hard to devise a set of O(n?) tests whose results give all the 
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@ 


Figure 2.2: Update Graph of 3-bit Register World 


desired information. (For example, the square behind the robot is red if and only if the test 
“turn-left turn-left see-red” is true.) 

Given this information, it is easy to see how to predict the state of the environment 
after a given sequence of actions. In fact, it becomes clear that this is the “natural” 
representation of this environment, and that the intuitive representation and simulation 
procedure one would use for this environment are captured almost exactly by the diversity- 
based representation and simulation procedure given in the previous section. 

We note that because of the “paint” operations, this environment is not a permutation 


environment. 


2.8.2 n-bit Register World 


In this environment, the robot is able to read the leftmost bit of an n-bit register. Its actions 
allow it to rotate the register left or right (with wraparound) or to flip the bit it sees. 

Clearly, this automaton consists of 2” global states, but its diversity is only 2n since 
there is one test for each bit, and one for the complement of each bit. We note that the 
register world is a permutation automaton. 

The update graph of this environment is depicted in Figure 2.2. The name “1” in the 
figure refers to the predicate which returns true if the leftmost bit is a 1, and “L”, “R” 
and “F” refer to the actions which rotate left and right, and which flip the leftmost bit. In 


the current state, the register contains the values 101. The borders of the tests which are 
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Chapter 3 
Our Inference Procedure 


The inference procedure tries to construct a perfect model of its environment by meeting 
the two requirements of the simulation theorem (Theorem 3). That is, the procedure first 
infers the structure of the update graph, and then maneuvers itself into a state gq where it 
knows the value gt for every equivalence class [¢]. 

We will see that the first problem of constructing the update graph is by far the harder 
of the two. We therefore begin with the second problem of determining the associated value 


of each test equivalence class. 


3.1 Inferring the Values of the Test Equivalence Classes 


Suppose then that the update graph’s structure is entirely known, and we now wish to 
determine the value associated with each vertex (equivalence class) of the graph. 

Assign to each vertex a variable z; which will stand for the value of that vertex in the 
starting state. Since the execution of any action causes each vertex to be updated with the 
value of one of the other vertices, we see that the value of each vertex in every future state 
will just be one of these variables z;. Our goal is to reach a state in which all of the variables 
still in existence are known. (Some variables may disappear, but this is of no consequence 
since, for perfect predictability, we only need to know the values of those that still exist.) 

Initially, all of the variables are unknown. We can “solve” for a particular variable 2; 
by causing one of the predicates p to be updated with the value z;. In this state, z; is the 
value of p which is directly observable. 

If all of the existing variables are known, then we are done. Otherwise, there must be a 
vertex [t], where ¢ = ap, with unknown value z;. Then by executing action a, we move the 
value of t to predicate p, and thus we learn the value of variable z;. Repeating this process, 
we solve for all existing variables. 

Note that the executed action sequence a above need not be longer than the size of the 
update graph D(€). Further, each iteration of this loop decreases the number of unknown 
variables by one. Since there are initially only D(€) variables, we see that this part of the 


inference problem can be solved in O( D(€)?) time. 
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We focus for the remainder of this chapter on the problem of inferring the structure of 


the update graph. 


3.2 An Inference Procedure Using an Oracle for Equiva- 
lence 


We begin by supposing that we have an oracle available that can tell us whether two tests 
s and t are equivalent. 

Our algorithm (Figure 3.1) builds up the update graph, adding one edge at a time and 
creating new vertices when necessary, until no more edges can be added. Here, the program 
variable V represents the current set of vertices (equivalence classes). We assume that the 
predicates are inequivalent to one another, so initially V consists of one equivalence class 
for each of the predicates. 

The edges of the graph are represented by the function y: For each equivalence class [t], 
and each basic action b, the program computes the vertex at the tail of the unique b-edge 
directed into [t], so that y([t],b) = [bt]. If this is a vertex already in V, then an edge is 
simply added; otherwise, a new vertex [bt] is first created and added to V before noting the 
new edge. 

Since |V| is bounded by D(£), we see that the procedure must halt, and in particular, 
makes no more than 


|B\lv? < |B|D(E)? 


calls to the equivalence testing oracle. 


3.3 Determining If Two Tests Are Equivalent 


We now turn our attention to the problem of determining whether or not two tests are 

equivalent. The inference procedure can prove that tests s and t are inequivalent if it can 

find a state q such that qs # qt; a single counterexample to the conjecture s = t suffices. 
We wish to experiment with the available automaton € in order to prove s # t. There 


are two problems we face: 


e (Accessibility of Countereramples) It may be difficult or impossible to get the automa- 


ton into a state gq where gs # qt, even if such states exist. 


e (Irreversibility of Actions) Even if we can get the automaton into such a state q, once 


we run test s we are in general unable to “back up” so as to be able to run test ¢. 


Let us define two tests to be compatible if the action sequence of one is a prefix of the 


action sequence of the other. We note that irreversibility of actions is not a problem when 
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Input: 

P - set of predicates 

B - set of basic actions 

Oracle for testing if s = ¢ for any tests s and t 
Output: 

V - set of equivalence classes 

x:Vx BV such that x(([t], 6) = [bt] 
Procedure: 


Vv — {[p]| pe P} 
while x({t], 5) is undefined for some [t] € V,b € B do 


if bt = s for some [s] € V then 
x({#],5) — [3] 
else 
V<—VU {[bt]} 
x([t], 6) — [6¢] 
endif 
end 


Figure 3.1: An Inference Algorithm Using an Oracle for Equivalence of Tests. 


testing the equivalence of two compatible tests since they can be executed simultaneously. 
In particular, a predicate is compatible with all other tests. 
We present solutions to these difficulties for the special class of permutation environ- 


ments, and then discuss progress toward a solution in the general case. 


3.4 Determining Test Equivalence in Permutation Environ- 
ments 


Assume then that € is a permutation environment. It is easy to show that each action 


permutes not only the global states, but the set of test equivalence classes as well. That is, 
(vite T)(Vs € T)(VbE B)s =t > bs = bt. (3.1) 
3.4.1 Overcoming Irreversibility of Actions 


We show first how the problem of irreversibility of actions can be overcome by modifying 
the control structure of the basic algorithm so that any test can effectively be made compat- 
ible to any other test (Figure 3.2). This is essentially the same algorithm as in Figure 3.1; 
every new equivalence class is being compared against (nearly) all the known equivalence 
classes. However, the order in which these comparisons are made has been altered to ensure 


that every test in V can later be made compatible to any other test. 
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Input: 

P - set of predicates 

B - set of basic actions 

Oracle for testing if s = ¢ for any tests s and t 
Output: 

V - set of equivalence classes 

x: Vx BV such that x((t], 5) = [bt] 
Procedure: 


vV — {[p] | pe P} 
while y((t],) is undefined for some [¢] € V,b € B do 
ne 1 
while (V[s] € V)b"t # s do 
n—-n+l1 
forl<i<n 
V VU {([bie}} 
x((0'-14],d) — [b%] 
x([b*-1 4], 6) — [s] {where s = 6"t and [s] € V} 
end 


Figure 3.2: A Modified Inference Algorithm for Permutation Environments 


The following theorem shows that no equivalence class is added twice to V by this 


algorithm, and furthermore that the inner loop is guaranteed to halt: 


Theorem 7 Let [t] be a vertex in the program variable V, 6 a basic action in B, and na 
positive integer such that for all [s] € V and alll <i <n we have s # b't. Then the tests 


bt, b7t,...b"—!t are pairwise inequivalent. 


Proof: Suppose to the contrary that 6't = b/t for some i,j, 1 <i < j <n. Then by (3.1), 
t = b)-"t contradicting the hypothesis since 1 < j —i <n but [t] € V. 

Essentially, the preceding theorem shows that the modified algorithm of Figure 3.2 is 
“just as good” as that of Figure 3.1 in the sense that both will correctly infer the update 
graph in roughly the same number of calls to the equivalence testing subroutine. Both 
algorithms also share the property that, at all times, the value of any equivalence class [t] 
in V can be “read” directly simply by executing t. That is, if t = ap,a € A,p € P, then 
by executing a, we pass the current value of ¢ to the predicate p where it can be observed 
directly. 

The following theorem shows that the modified version of the algorithm has the addi- 
tional property that the value of any [t] in V can be not only “read,” but “set up” as well. 
The theorem states that a path @ can always be found in the current state of the update 


graph from some predicate class [p] to [t]. Thus, by executing a, we pass the observable 
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value of [p] to [¢]. This property is crucial to the equivalence testing subroutine presented 


below. 


Theorem 8 Between each iteration of the outer loop of Figure 3.2, if [t] is any verter in V 
then a path exists in the current state of the update graph from some predicate’s equivalence 


class to [t]. 


Proof: By induction on the number of iterations of the outer loop. 

Initially, V consists only of predicate equivalence classes, and so the property holds 
trivially. 

Suppose the theorem’s statement holds at the top of one iteration of the loop. Consider 
the end of this iteration. We need to show there is a path from some predicate to each 
new [b't],1 <i <n, added to V. We have b"¢ = s, for some [s] € V, and therefore, by the 
inductive hypothesis, we know of some a € A,p € P for which a is a path from [p] to [s]. 
Thus, p = as = ab"t = (ab"—')b't. In other words, ab"~" is a path to [b't] from the predicate 
equivalence class [p]. 

Theorem 8 is used by the equivalence testing subroutine below. Although this procedure 
could be generalized for testing the equivalence of any two tests t and s, we assume here 
that the equivalence class of one of the tests, s, is already represented by a vertex [s] in V. 
Then there is a path a from some predicate equivalence class {p] to [3]; that is, p= as. By 
(3.1) then, ¢t = s if and only if at = as = p. Note that p, being a predicate, is compatible to 
at, and so the values of the two tests in a given state can be compared directly by executing 
both simultaneously. 


Here is the algorithm for testing if s and ¢ are equivalent: 

1. Find a path a in the update graph from some predicate’s equivalence class [p] to [s]. 
2. Get the environment into some random state q. 
3. Execute p and at (simultaneously) to find their values in q: If gp # gat, then s #t. 
4, Repeat steps 2 and 3 until confident that s =t. 


Thus, we have overcome the problem of irreversibility of actions in permutation environ- 
ments by applying knowledge already gathered about the structure of the update graph to 
effectively force the compatibility of any two tests which we might be interested in compar- 
ing for equivalence. Still missing from this algorithm are a method of effectively randomizing 
the environment (step 2), and a corresponding bound on the number of iterations of steps 2 


and 3 necessary to confidently conclude that s = t. 
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3.4.2 Overcoming Accessibility of Counterexamples 


To rigorously prove that two tests are equivalent, we would have to show that their values 
are the same at each of the global states. In general, this is infeasible (one reason being 
that the state space may be enormous). Essentially, the preceding algorithm overcomes this 
difficulty by selecting a random sample from the state space. If at a single state the tests 
have different values, then the inference procedure may conclude with absolute certainty 
that the tests are inequivalent. Otherwise, the procedure concludes, with some possibility 
of error, that the tests are equivalent. We show below how this probability of error can be 
made vanishingly small. We prove that, in permutation environments, we have an adequate 
chance of finding a state in which the values of two inequivalent tests differ simply by taking 
an appropriate random walk. 

We begin with a general discussion of random walks on directed graphs and of cer- 
tain properties of point symmetric graphs (defined below), and next apply these results in 


proving a probabilistic upper bound on the running time of our algorithm. 


3.4.2.1 Random Walks on Directed Graphs 


We are concerned with random walks on a strongly connected (every vertex reachable from 
every other vertex) directed graph G which has n vertices and which is regular of degree d in 
the sense that every vertex has in-degree and out-degree equal to d. G may have self-loops 
and multiple edges between vertices. Let A = {a;;} denote the adjacency matrix of G, so 
that a;; is the number of edges between vertex i and vertex j. 

The random walk we are concerned with has the following form. We begin at an arbitrary 
vertex. At each step we first flip a fair coin. If we see “heads” then we stay at the current 
vertex, otherwise we pick one of the d outgoing edges uniformly at random and traverse it. 


This random walk defines a finite Markov chain with transition matrix 
1 1 
B= —(I+-A). ; 
al +54) (3.2) 


If we let p; denote the vector whose i-th component p,; is the probability of the Markov 


chain being in state i (i.e. at vertex 7) at time ¢, then we have the recurrence: 
T r 
Pty = P; B. (3.3) 


The initial vector po describes the probability of picking each vertex as the starting vertex. 
We observe that the matrix B™ contains all positive entries for some positive integer 
m. Thus by the Perron-Frobenius theorem, B has an eigenvalue \, = 1 with multiplicity 


1 and corresponding eigenvector 7. For any other eigenvalue \ of B, |\| < 1. Also, it is 
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easy to see that since G is regular, the eigenvector 7 = (4, i, ray 1), This is the stationary 
distribution for our Markov chain. 
As we take more and more steps in our random walk, the probability vector p; converges 


to 7; we lose track of where we began and are more or less equally likely to be at any vertex. 


Theorem 9 [ft = cdn?, then 
Ip: — || < e77° (3.4) 


where ||z|| is the ordinary Euclidean norm. 


Proof: Let \1,...,A, be the eigenvalues of B, where A, = 1 and the other eigenvalues are 


arbitrary complex numbers arranged in order so that 
1 = Ay > |Aa| > [As] 2 --- > An. (3.5) 


(We note that if \ is an eigenvalue of B, then so is X, since B is real.) 
We now argue that the theorem follows if it can be shown that the maximum magnitude 
of any of \z,...,An is bounded above by 1 — 35. 


Indeed, if we let q; = p; — 7 denote the “error vector” at time ¢, then it follows that 


Ilaesall < [Aal - Ilaell- (3.6) 


(This follows, for example, from the algebraic treatment of finite Markov chains given in 


[6].) 


Since ||qo|| < 1, it follows from our assumption that |A2| < 1 — 325 that 


2 —2t 
ilael] <(1- Ga) < ean. (3.7) 
This is at most e~° if t = cdn?, as desired. 
We now proceed to show that the maximum magnitude among A2,...,An is at most 
1— y. 


Let K1,...,4n denote the eigenvalues of the matrix tA, where kK, = 1. By the Perron- 
Frobenius theorem, all of the «; lie on or within the unit circle. Assuming that the indices 


for the «;’s have been chosen appropriately, it follows from equation (3.2) that 


1 1 ; 
= 5 tle for 7 =1,...,n. (3.8) 


A; 
Therefore, all of the 4;’s lie within the circle in the complex plane with center at $ and 


radius + 


We begin with a result due to Fiedler (7] that applies to any doubly stochastic matrix— 


in our case the matrix 4A. (This is a combination of his Lemma 3.5 and his Theorem 3.2.) 
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Define an eigenvalue x of 4A to be “nonstochastic” if « # 1. Fiedler’s result says that the 


real part of any nonstochastic eigenvalue « of 4A is bounded above: 
Re(k) <1-2(1- cos(~))($) (3.9) 
where S = 3(4A7 + 4A) and y(S) is defined by 


w(S)= min > Si; (3.10) 
O#X SV tex jEeV-X 
where V = {1,...,n}. Here u(S) is a “measure of the irreducibility” of S, S can be 
interpreted as the adjacency matrix for a graph H which is the average of the graph G and 
its inverse, and (5) is the minimum (over all partitions of the vertex set V into nonempty 
parts X and V — X) of the sum of the weights of the edges going from X into V — X. 


In our case we can only say that 
1 
since all we know is that A is strongly connected. Thus Fiedler’s theorem implies that 
m1 
Re(x) < 1— 2(1- cos(—))5- (3.12) 


Since cos(=) < 1 -— 4 for n > 2, we have 


8 

Re(x) < 1- and: (3.13) 
It now follows that if A is a “nonstochastic” eigenvalue of B, then A must lie in the 
shaded region of the complex plane shown in Figure 3.3: From equation (3.8) and the fact 
that |k| < 1, we see that A must lie inside the circle C in the figure. Furthermore, combining 
equations (3.13) and (3.8), we see that the real part of A is bounded above, and so A must 


lie to the left of some line ZL. Thus, applying some elementary trigonometry, we obtain 
4 2 
< / ee ae Ce ; 
|A| < 4/1 ant = 1 Tn (3.14) 


If we set c to be approximately log(n), we have the following easy corollary: 


Corollary 1 After t = dn*log(n) steps we have a chance of at least ts of being at any 


given verter. 
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Figure 3.3: Region of Complex Plane in Which Nonstochastic Eigenvalues May Lie 
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3.4.2.2 Point Symmetric Graphs 


Next, we turn to a discussion of point symmetric graphs, and prove a lemma needed in 


proving Theorem 10 below. 


Definition 2 A graph G is point symmetric tf for all pairs of vertices v,w in G, there 


exists an automorphism on G which maps v to w. 


Definition 3 A bipartite graph G is bipartite point symmetric if for all pairs of vertices 


v,w on the same side of the graph, there exists an automorphism on G which maps v to w. 


It is easy to see that all vertices have the same degree in a point symmetric graph, and 
likewise for all vertices on the same side of a bipartite point symmetric graph. 


The proof of the following lemma is due in large part to Satish Rao: 


Lemma 1 Let G = (V,£) be an undirected, connected point symmetric or bipartite point 
symmetric graph with degree at least d at every verter. Let m be the minimum number of 


edges that must be removed to separate G into two non-empty pieces. Then m > d. 


Proof: For arbitrary subsets S,T of vertices, let D(S,T) be the number of edges connecting 
points in S with points in T, and let C(S) be the number of edges cut in separating S from 
the rest of the graph: 

D(S,T) = |{{s,the F| se S,teT}I. 
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C(S) = D(S,V — $). 


Then m = min{C(S) | 04 SEV}. 

Suppose, to the contrary of the theorem’s statement, that m < d, and let S be the 
smallest non-empty subset of V for which C(S) = m. 

Since C(S) > 0, S contains some boundary point j, that is, a vertex j connected to some 
vertex outside of S. 

We claim S contains an interior point i as well, i.e., a vertex not on the boundary. If 
this were not the case, then all k = |S| vertices in S are boundary points so that k < m. 
The number of edges connecting vertices in S is at least 

dk —m “ dk-—d 
2 2 
ae) 
2 
k(k — 1) 
2 


Clearly, it is impossible for more than (5) edges to connect k points. 


In the case that G is only bipartite point symmetric, we can assume that 7 and j are 
on the same side of the graph. Suppose otherwise. Then the k, vertices on one side of the 
graph are interior, and the k2 vertices on the other side are boundary points. Thus kz < m, 
and so the number of interior edges is at most kyk2 < kim < kd, a contradiction since the 
ky vertices on the first side are interior. 

Therefore, in either case, we may conclude that there is an automorphism o on G 
mapping 7 to j. Let S’ be the image of S under o. Then |S| = |S’| and C(S’) = C(S) = m. 
Since j is a boundary point of S but an interior point of S’, 5 # S’. 

Let F=SNS8',X =S-I,X'=S'-I,and Z=V—(SUS’) (Figure 3.4). Since j € J, 
I is not empty. The sets X and X’ are also non-empty since S and S’ are unequal sets of 
the same size. Therefore, 0 < |X| < |S| and so C(X) > m. Similarly, C(X’) > m. 

We have: 


C(S) = D(X,Z)+ D(X, X') + DU, X’) + DU, Z) 
C(S’) = D(X',Z)+ D(X’, X)+ DU, X)+ DU, Z) 
C(X) = D(X,Z)+ D(X, X')+ D(X, 1 
C(X’) = D(X',Z) + D(X’, X) + D(X’, 1) 


Thus, we have the following contradiction: 


2m = C(S)+C(S’) 
C(X) + C(X’) + 2D(I, Z) 
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we 


S' 
Figure 3.4: Construction for Lemma 1 


2 C(X)+C(X') 


> 2m. 


3.4.2.3 Finding Counterexamples with Random Walks 


With these results, we are finally able to prove: 


Theorem 10 Let s and t be two inequivalent tests of a permutation environment € of 
diversity D. We take a random walk of length 2|B|D*log(D) beginning at an arbitrary 
start state. At each step, with equal probability, we either do nothing, or we execute a 
uniformly and randomly chosen basic action from B. Then the probability that the values 


of s andt differ at the state where we complete this walk is at least ap: 


Proof: Consider the graph P defined as follows: The vertices of P are all ordered pairs 
({as],[at]) for all a € A, and an edge 6 is directed from vertex (({s1], [t1]) to ([s2], [te]) iff 
3, = bsg and t, = bt2. Clearly, P has no more than D(D — 1) < D? vertices. Further, as 
with the update graph, the vertices are permuted by each basic action, so there is exactly 
one ingoing and one outgoing edge for each basic action at each vertex. (Alternatively, P 
can be viewed as the left coset graph of the subgroup which stabilizes both [s] and [t].) 
Let a = b,...5, be the chosen random sequence of basic actions, and let q be the 
starting state. When a is executed, the environment moves to state ga where 3 and t have 


the values gas and gat. In other words, s and t are updated with the values of as and at in 


2t 


state g. The tests as and at have different values at q if and only if s and ¢ have different 
values at the completion of a. 

Thus, we can regard the reverse of the random walk a as an equally random 
walk through P; at each step, we move from vertex ((bj41...6n3],[bi41.--bnt]) to 
([b:bi41---bns]), [b:bi41...5,t]) by traversing the reversed edge 6,;, and finally arriving at 
({as], fai). 

Since we are taking a random walk of just the form and length described in the hypothesis 
of Corollary 1 for a graph such as P with at most D? vertices, and both indegree and 
outdegree equal to |B| at each vertex, we see that our (reversed) random walk has a roughly 
equal chance of finishing at any of the vertices of P. 

We now argue that, for at least 4, of the vertices ((s'], [t’]) of P, we have qs’ # qt’. This, 
combined with the preceding arguments, will prove the lower bound on the probability of 
finding a counterexample. 

Let the orbit of any test u be the set O, = {[au] | a € A}. 

Consider the graph C defined as follows: The vertex set V of C is the union O, U O:, 
and an (unlabeled) edge is directed from [s’] to [t’] if ([s’], [¢’]) is a vertex of P—that is, if 
s’ = as and t’ = at for some action a € A. 

We argue first that C is (bipartite) point symmetric. If [s;],[s2] are in O,, then there 
is some action a for which s2 = as,. Let o be the permutation mapping each vertex [u] 
to [au]. Then o maps [s;] to [sg] and furthermore defines an automorphism on C' since 
if ([s’], [t’]) is an edge, then so are ([as’], {at’]) and ([a~"s’], [a~12’]). Similarly, for any two 
tests in O,, there is an automorphism on C’ mapping the first to the second. 

By the definition of orbits, we have that O, and QO; are either equal or disjoint. In the 
former case, the preceding argument shows that C is point symmetric. In the other case, 
C' is a bipartite point symmetric graph. 

In either case, let d, be the outdegree of each vertex in O, (necessarily the same at each 
vertex by the preceding argument) and similarly define d; as the indegree of each vertex in 
O;. Then the number of edges in C is exactly d,|O;,| = d;|Oz|. Let d = min{d,, d;}. 

Let X be the set of vertices [u] of C for which qu is true. Then each edge connecting 
(in either direction) a vertex in X with another in its complement corresponds to a vertex 
({s‘], [¢’]) in P for which qs’ # qt’. We therefore would like to show that at least 7, of the 
edges of C’' connect X to its complement. This will be the case if we can find at least d such 
edges. 

Since s # t, there is at least one such edge. Let C’ be the subcomponent of C connected 
to this edge. The graph C” is still (bipartite) point symmetric. Therefore, simply regarding 


the edges of C’ as undirected, and applying Lemma 1 to it, we see that at least d edges are 
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cut in separating X from its complement in C’, as desired. 
This completes the theorem. Ml 


Using this result, we can show the following theorem, the main result of this section: 


Theorem 11 Let € be a permutation environment with diversity D. Given € > 0, our 


algorithm will infer the structure of E in time 


0(1BI?D" dog( 27 ))(tog(D))) (3.15) 


with probability of error less than e. 


Proof: The preceding theorem states that the probability of distinguishing two inequivalent 
tests, having taken an appropriate random walk, is at least 3D: Thus, the probability of 
failing to do so after n trials is no greater than (1— 545)". This error probability is bounded 


by a parameter 6 when 
log 6 


~ log(1 - 35) 
As many as I = |B|D? inequivalence tests may be made in the course of inferring the 
automaton. The probability, then, of successfully distinguishing all of the inequivalent pairs 
of tests is at least (1 — 6). Our goal is to make this probability more than 1 — «. We have 


been given € and choose 6 < 7. Then 
1—~e<1-I6 < (1-6) 


as desired. 
Finally, if we choose n > 2D log a then our probability of error on an individual exper- 
iment is sufficiently small since 
log £ 
2Diog=§ > —Ss- 
é log spr 
log + 
log(1 -— ap) 
log 6 
log(1 ~ 35) 


IV 


Here, we have used the fact that logz < x — 1 for all z. (In particular, if 2 < 1 then 
logz <z-1> nee < qty. Above, we have applied this formula with z = 1 — jy.) 
Hence, our procedure requires J inequivalence tests. Each of these requires up to 2D log L 
experiments, each of which can involve a random walk of length 2|B|D‘log(D). (The time 
to run the actual experiment, or to determine which experiment is to be performed next is 


negligible.) We thus arrive at the running time stated in the theorem. Mf 
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Thus we have completed our algorithm by exhibiting an effective random walk technique. 
Note that, implicitly, we have assumed that the diversity, or an upper bound Daz on the 
diversity, has been given to the inference procedure since the diversity must be known to 
calculate the length and number of random walks needed. If no such bound is available, the 
algorithm can be executed repeatedly with Dmaz = 1,2,4,8,... If Dmaz is smaller than the 
true diversity D, then either the algorithm will be unable to build a small enough update 
graph, or it will construct an incorrect update graph which will sooner or later make a wrong 
prediction. When either of these occur, we double Dmaz and run the inference procedure 
again. 

The bounds stated in the preceding theorems have been tightened significantly since 
our original presentation of the algorithm. Empirically, however, we have found that much 
shorter random walks and far fewer experiments are sufficient, and we therefore conjecture 


that the bounds are still not tight. 


3.5 Determining Test Equivalence in General 


We discuss now the general case in which € is not necessarily a permutation environment. 
We don’t at the moment know how to handle in a rigorous manner the first difficulty of 
finding a state in which two inequivalent tests can be distinguished, even if we assume that 
€ is strongly connected. Nonetheless, in practice this may often not be a concern; if two 
tests s and ¢ are inequivalent then there are usually many easily reached states g such that 
qs # qt. 

We now propose a technique for handling the irreversibility of actions in general envi- 
ronments. 

We need to figure out how to get € into a state gq where we know the value of the test 
qt, even though we haven’t run test ¢ yet, so that we can run test s instead. 

Let t = ap; here a is the action part of test ¢ and p is the predicate. 

Suppose we run action a repeatedly. Eventually the predicate p will exhibit periodic 
behavior. Once we know that this periodic behavior has been established, and once we 
know the period m of this behavior, then we can figure out the value of gt for the current 
state q without having to run the test t. 

We have to address the problem that for general finite-state automata, it is well known 
that the eventual period can be as large as |Q|, the size of the automaton. This would be a 
serious problem for our proposed approach, since the size can be an exponential function of 
the diversity. However, the following theorem shows that the period is no larger than the 


diversity. 
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Figure 3.5: The Rubik’s Cube World 


Theorem 12 Let D= D(E). If we run action a repeatedly, then the behavior of predicate 
p will exhibit transient behavior for no more than D steps, and then will settle down into 


periodic behavior with period at most D. 


Proof: This follows easily from our simulation theorem (Theorem 3). Consider the sequence 
of tests p,ap,a*p,...,a?p. Since there are only D test equivalence classes, by the pigeon 
hole principal, at least two of these tests are equivalent. Say a'p = a’p where i < j. Recall 
that p is passed its value from a*p under action a*. Therefore, p will exhibit transient 
behavior for at most the first i executions of a, and will then settle into periodic behavior 
with period j —7. Hf 

To complete the description of our inference procedure, we suppose as above that an 


upper bound Dmaz is available on the diversity D(E) of the automaton being inferred. 


To run the algorithm of Figure 3.1, we need a way to test s and t = ap for inequivalence. 


The following procedure is suggested by the previous theorem: 
e Get the environment into some random state. 
e Run action a for Daz steps. (This is to eliminate transient behavior of p.) 


e Run action a for 2D naz steps, keeping track of gp for each state g reached. 
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Figure 3.6: The Little Prince’s Planet 


Use the information gathered in the previous step to determine the period of predicate 
p under action a. Use this information to determine whether gt is true or false in 


the current state q (without running test t). 


Run test s to determine qs. 


If gs qt, then s #t. 


Repeat until confident that s = t. 


As before, this is a one-sided test: a report that s # ¢ is certainly correct, but a report 
that ¢ =t may be erroneous. 

The test must be re-run a number of times before concluding that s = t. To make the 
trials as independent as possible, we may: 


e Take a “random walk in €” between each trial, by executing some randomly chosen 


sequence of actions. 


e Repeatedly execute an action ab instead of just a in each trial, where 6 is an arbitrarily 


chosen action in A. 
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Figure 3.7: The Car Radio World 


These heuristics may not help to find a counterexample in all cases; but are reasonably 
effective in practice. (We hope to prove the effectiveness of these techniques as we did in 
the permutation environment case for a broader class of finite automata.) 

Also, for efficiency, we are in many instances able to force compatibilities as in the 
permutation environment case, and can often compare many tests against many other tests 
in single experiments. These heuristics lead to many-fold improvements of our running 


times. 


3.6 Experimental Results 
3.6.1 Three More Toy Environments 


Consider the following permutation environment based on “Rubik’s Cube” (Figure 3.5). 
The robot is allowed to see only three of the fifty-four tiles: a corner tile, an edge tile and 
a center tile, all on the front face. Each of these three senses can indicate any one of six 
colors. The robot may rotate the front face, and may turn the whole cube about the z and 
y axes. (By reorienting the cube he can thus turn and view any of the six faces.) 

As another example environment, consider a robot just delivered to the “Little Prince” 
[16] on his home planet (an asteroid, really). This planet has a rose and a volcano, which 
the robot can see when he is next to them; the available sense values are “See Volcano” and 
“See Rose”. The planet is very small—it takes only four steps to go all the way around it. 
The basic actions available to the robot are “Step Forward”, “Step Backward”, and “Turn 


Around”. See Figure 3.6. In the state shown, the robot has no sensations, but he will see 
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Table 3.1: Experimental Results 


the volcano if he takes a step forward, and will see the rose if he takes a step backwards (or 
turns around and takes a step forwards). 

In the last micro-world, the robot can fiddle with the controls of a car radio (see Fig- 
ure 3.7) and can detect what kind of music is being played. There are three distinctive 
stations which define the robot’s sensations: rock, classical, and news. The robot can use 
the auto-tune to dial the next station to the left or right (with wrap-around), or can select 
one of the two programmed stations, or can set one of these two program buttons to the cur- 
rent station. Unlike the last two environments, the Car Radio World is not a permutation 


environment because of the robot’s ability to program stations. 


3.6.2. Summary of Results 


Table 3.1 summarizes how our procedures handled these environments, as well as the 
5x5 Grid World environment and the 32-bit Register environment described in Section 2.8. 

The most complicated environment (Rubik’s Cube) took less than two minutes of CPU 
time to master—we consider this very encouraging. 

Rubik’s Cube, the Little Prince and the 32-bit Register Worlds were explored with an 
implementation (version “P”) which exploits the special properties of permutation envi- 
ronments, but which only compares one pair of tests at a time. All worlds were explored 
as well by version “M”, which tries to compare many tests against many other tests in a 
single experiment. The run times given are in seconds. The last three columns give the 
number of basic actions taken by the robot, the number of sense values asked for, and 
the number of experiments performed. (An experiment is defined loosely as a sequence of 
actions and senses from which the robot deduces a conclusion about equivalence between 
tests. Information about several tests may be obtained in a single experiment, and the same 
sequence of actions and senses may be repeated several times, each repetition counting as 
one experiment. Also, we have generalized the notion of a test here to allow the function 7 


to map @ x P into an arbitrary set of sensations, not necessarily the set {true, false}. For 
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Chapter 4 


Inference of Visible Simple Assignment 
Automata with Planned Experiments 


In this chapter, we focus on the problem of planning experiments when trying to infer the 
structure of a finite automaton by experimentation. In the preceding chapters, we were 
concerned with the same general problem. However, our focus was on the identification of 
hidden state variables, rather than on the planning of experiments. 

The experimental technique used in the preceding chapters was a simple one based on 
the properties of random walks. As a consequence, we could only prove our techniques to 
be effective for a restricted class of automata (permutation automata). The key difficulty in 
extending our proof is that random walks are not in general guaranteed to get the automaton 
into a desired state (or set of states) with sufficiently high probability. For the general case, 
it seems clear that experiments have to be planned carefully. 

This chapter does not address the issue of hidden state variables; we assume that all state 
variables are visible to the observer. We make this simplification to bring to the foreground 
the issues regarding the planning of experiments. Of course, at some point we would like 
to merge the techniques developed here with those for identifying hidden state variables. 

Aside from this difference in the visibility of state variables, the automata we study are 
structurally identical to those studied up to this point. Recall from Section 2.6 that every 
finite-state deterministic system can be represented as a simple assignment automaton in 
which each variable stands for one test equivalence class. In this chapter, to simplify our 
discussion, we drop the equivalence class terminology, and instead formally redefine an 


environment as a simple assignment automaton. 


4.1 Definitions 


We define a simple assignment automaton to be a tuple (V, B, 6, qo) such that 
e V = {2z1,...,2,} is a finite nonempty set of n binary state variables, 


e B isa finite nonempty set of input symbols, also called basic actions, 
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e 6 isa function from {1,...,n} x B into {1,...,n}; 6 is called the update function, and 
® go (the initial state of the automaton) is a function mapping V into {0, 1}. 


The (global) state of the automaton is an assignment of a binary value to each variable 
inV. 
On input a € B, the automaton makes a transition from its current state x = 


(t1,...,%n) to the state x’ = (x{,...,2/,) where 
Ly = 25(i,0)3 (4.1) 


each variable is updated by a simple assignment from the value of some other variable (or 
possibly the same variable). 

As before, we let Q denote the set of all global states g reachable from the initial state 
go of the automaton. 

In Section 2.6 we argued that every finite-state binary output Moore automaton is 
equivalent to a simple assignment automaton where one or more of the state variables 
specifies the output. The number of state variables in the smallest corresponding simple 
assignment automaton is just the diversity of the original finite-state automaton. 

We say that a simple assignment automaton is visible if all of its local state variables 
are observable. 

We assume henceforth that we are dealing with a particular visible simple assignment 
automaton € = (V, B,6,qo), which we call the environment of the learning procedure. 

We assume that € is reduced in the sense that, for each pair of distinct variables z;, 24 € 
V, there is a state gq € Q such that 2; # x, at g. (This assumption is made for simplicity 
here to avoid degenerate but easily handled cases where variables are indistinguishable.) 

We let A = B* denote the set of all sequences of zero or more basic actions in the 
environment €; A is the set of actions possible in the environment €, including the null 
action x. 

We extend 6 to the domain {1,...,n} x A in the natural way: 6(i,\) = i and 6(i, ba) = 
6(6(2,a),6) for i € {1,...,n},b€ B,a € A. Thus 6(i,a) identifies the variable whose value 
z; takes under action a; equation (4.1) now holds for any a € A. 

Finally, we assume that € is strongly connected: it is possible to get from any state in Q 
to any other. (Otherwise, it may be impossible to infer € completely, since € will get stuck 


in one of its several strongly connected components.) 
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Figure 4.1: The Effect of Action p in Our Example Simple Assignment Automaton 


4.2 Example 


To make things concrete, consider the simple assignment automaton € illustrated in Fig- 
ure 4.1. 

Here € has n binary state variables {z1,...,2,}, where n is even. We think of the values 
of these variables as being drawn from the set {Red, Green}. 

We imagine the n variables as being divided into n/2 “columns”, where 22;-1 and 2; 
are in the same column, for i = 1,...,n/2. 

There are four input symbols, or “basic actions”: p,q,r,s. On any input, the variables 
in the 7-th column are updated in some way from the variables in the i — 1st column. (We 
assume that the variables in the first column never change value—z, is always Red and x2 
is always Green.) Since each of z2;-; and 22; can be assigned one of z2;_3 or 22;-2 in two 
ways, there are a total of four distinct ways in which the variables in column i can depend 
upon those in column 7— 1. Each input symbol is associated with one of these possibilities, 
but in a manner that is arbitrary and varies from column to column. Figure 4.1 illustrates 
the effect of action p, and a typical state of the automaton; the other three actions could 
be illustrated with similar diagrams. 

It is important to note that two of the four possibilities are guaranteed to give a column 
a monotone coloration, independent of whether the column to the left has a monotone or a 
mixed coloration. 

This automaton has a number of states which is exponential in n — it is easy to see 


that every column except the first can be made all Red or all Green. And there are many 
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other states where columns other than the first have a mixed coloration. 

However, it is easy to see that in order for a column to receive a mixed coloration, its 
neighbor to the left must have had a mixed coloration on the previous step. Furthermore, 
mixed colorations are easily destroyed as the column colorations move rightwards. Once a 
column has a monotone coloration, this coloration propagates to the right unchanged with 
each input. It should be clear that a random string of input will have a small chance of 
giving a mixed coloration to any columns except a few of the leftmost ones. 

We now observe that in order for an inference algorithm to figure out how the later 
columns are wired together, the algorithm must propagate the mixed colorations all the 
way down to the right. This can only be accomplished by careful planning and execution 
of experiments, and not by random walk techniques. 

We view this example as a fancy kind of “combination lock”, since the algorithm must 
figure out a correct “combination” for giving column 7 — 1 a mixed coloration before it 
can figure out a correct combination for column 7. (Of course, there are many correct 
combinations, but there are many more incorrect ones.) 

It is not too hard to figure out how to approach this particular example, given all of the 
“side information” stated above. However, we must remember that the inference algorithm 
we seek is only told that it is to infer a simple assignment automaton where all local state 
variables are visible — it is not told such things as that the variables are paired up into 
columns, each column is updated from the one to the left, etc. In the absence of such side 


information, the general problem can be challenging. 


4.3 Our Inference Procedure 


We now present a procedure for inferring € by systematic experimentation. Our procedure 
is given as input V, B, and the ability to experiment with € by executing basic actions (i.e. 
giving the automaton inputs) and observing the state changes. Our procedure outputs the 
unknown function 6, in time polynomial in n = |V| and |B]. 

The algorithm maintains, as its fundamental data structure, a candidate set C(i,b) of 
possible values for the update function 6(i, b), for each variable z; and each 6 € B. Initially 
C(t,6) = V for all ¢ and b. 

Our basic strategy is to repeatedly plan and execute experiments which cause at least 
one C(i,6) to shrink. When no such experiment is possible C(i,b) = {6(2, b)} for all ¢ and 
b, so that 6 has been identified. 


Definition 4 We say 6 € B is an immediately useful experiment if there exist i,j,k such 
that j and k are both in C(i,6), and x; # xy. 
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If we execute the immediately useful experiment 5 then either j or k is removed from 
C(i,b) (e.g. j is removed if the new value for z; differs from the old value for z;). 

Finding an immediately useful experiment (if one exists) is easy since it requires knowl- 
edge of C but not of 6. But what shall we do if there are no immediately useful experiments 
to do? 

In such a case, there may exist some “setup action” a € A that will make b € B an 


immediately useful experiment. We call the combined action ab a “useful experiment”. 


Definition 5 Let o = ab wherea € A,bE€ B. We call o a useful experiment if there exist 
i,j,k such that x5(;,a) # 75(k,a) and j and k are both in C(i,b). 


The trouble with this notion is that to tell if ab is a useful experiment requires knowing 
the unknown function 6, in order to predict the effect of setup action a. We need an effective 
way of finding useful experiments. 

We introduce the notion of a “plausible experiment” to remedy this defect. 

First, as with the function 6, we extend C to the domain {1,...,n} x A: C(i,A) = {1} 
and C(i,ba) = Ujec(ia) C(9, 5) for? € {1,...,n},a€ A,dE B. 


Definition 6 We call o € A a plausible experiment if there exist i,7,k such that 7 andk 
are both in C(i,a), and x; # xx. 


Knowledge of C, but not 6, is all that is required to find plausible experiments. 
Note that all useful experiments are plausible since 6(i,a) € C(i,a) always. However, 
not all plausible experiments are useful. Our inference procedure depends on the following 


critical theorem. 
Theorem 13 The shortest plausible experiment is also the shortest useful experiment. 


Proof: 

Because every useful experiment is plausible, we need only show that the shortest plau- 
sible experiment is useful. 

Let o = ab,a € A,b € B be the shortest plausible experiment. Let 7,k be members 
of C(t,a) for which z; # zx. Then there exist r,s € C(i,b) for which j € C(r,a) and 
k € C(s,a). Since o is the shortest plausible experiment, and because |a| < |o|, all the 
variables in C(r,a) must have the same value. In particular, s(r,a) = Xj, and likewise, 
(sa) = tk. Therefore r5(,.4) # T4(3,a), 80 that o is useful. 

Not only is the shortest plausible experiment useful, but there always exists a plausible 


experiment up until the point when the inference task is finished. 
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Theorem 14 If there ezists ani and 6 such that |C(i,6)| > 1, then there exists a plausible 


experiment (and thus a shortest plausible experiment). 


Proof: Let z, and z, be two distinct variables in C(i,b). By assumption, there exists a 
global state q for which z, and z, obtain differing values, and such a state q is reachable 
from the current state (via some action a). Then o = ab is a useful (and therefore plausible) 


experiment. Hi 


4.3.1 The Basic Inference Algorithm 


We now give a high-level description of our inference procedure, assuming the availability 
of a subroutine which plans the shortest useful experiment. 
Initially, each C(i,b6) = V. Our procedure then repeatedly finds and executes useful 
experiments, each of which eliminates at least one variable from at least one candidate set. 
How many experiments are performed before each candidate set is a singleton? Since 
there are |B|n candidate sets, each initially of size n, at most |B|n? experiments are per- 


formed. The following theorem gives a tighter bound. 


Theorem 15 After no more than |B|n useful experiments are performed, each candidate 


set will be a singleton set. 


Proof: An easy induction shows that, between each experiment, for fixed b € B, two 
candidate sets C(i,b) and C(j,6) must either be disjoint or identical. (Two such sets will 
be identical if and only if z; = z; in every global state seen so far. When a state is first 
observed for which z; # z;, the common set C(i,b) = C(j,6) is split into two disjoint 
nonempty blocks, one of which becomes the new C(i,6) and one of which becomes the new 
C(j,6).) Thus each set C(i,5) is a block of a partition S, of a subset of V into pairwise- 
disjoint, non-empty subsets. Initially, 5S, = {V}; there is only one block. Each useful 
experiment ending in 6 causes at least one set C(i,b) to shrink, and so causes one or more 
of the blocks in 5, to either split or shrink. After n such operations, each block of 5S; (and 
therefore each candidate set C’(i, b) as well) will be a singleton. Thus, at most n experiments 


are performed ending in each of the |B] basic actions. 


The proof of this theorem suggests an efficient representation of the candidate sets. 
Rather than storing the sets explicitly, we maintain the partition S,, and represent each 
C(t, b) as a pointer to one of the blocks in Sy. This allows faster updating of the candidate 
sets between each experiment. 

Figure 4.2 gives a high-level description of our procedure (less the assumed experiment 
planning subroutine PLAN-EXP). 
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Input: V, B, and access to the environment E = (V, B, 6, qo). 
Output: 6 
Procedure: 
forbe B 
Sy, — {V} 
for 7 € {1,...,n}: C(i,b) — V. 
while PLAN-EXP can find a useful experiment o = ab do 


Execute a. Let (21,...,2%,) be the resulting state. 
Execute 6. Let (r},...,2/,) be the resulting state. 
for s € Ss 


Let m(s,0) = {1 € s | 2; = O}. 

Let r(s,1) = {1 € | 2; = 1}. 
for i € {1,...,n}: C(i,b) — m(C(i, b), 24) 
Sy — Usess,....np{C (i, 5)} 


forié {1,...,n},b€ B 
Output “6(i,b) = 2”, where C(2,6) = {zr}. 


Figure 4.2: The Basic Inference Algorithm 


Observe that each step of the main while loop takes O(n) time, except possibly for the 


execution of the experiment returned by PLAN-EXP whose length we discuss below. 


4.3.2 The Experiment Planning Subroutine 


The subroutine PLAN-EXP is given the candidate sets and the current state, and is asked 
to find the shortest useful experiment. By Theorem 13, this experiment is also the shortest 
plausible experiment. 

We can find the shortest plausible experiment by searching the space of unordered pairs 
of variables {j,k}, both in some set C(i,a), until we find one for which x; # z,%. More 
precisely, we do a breadth-first search of the forest of trees in which the root of each search 
tree is a pair {z,i}, and the b-children of each node {j,k} are the pairs {j’,k’} for which 
j' € C(j, 6), k’ € C(k,b). When a pair {j,k} is found for which 2; # zx, we return the 
experiment which is the path from the node {j,k} to the root of its tree. 

Since we search a forest of O(n”) vertices, each of degree O(|B|n?), this experiment 
planning subroutine runs in time O(|B|n*). Furthermore, the length of the experiment 
returned is bounded by the size of the search space, n?, Thus, the entire inference algorithm 


will run in time O(|B|?n5), having executed |B|n® basic actions. 


We now improve these bounds with a more efficient subroutine (Figure 4.3) which main- 
tains equivalence classes of variables using a “weighted union and collapsing find” data 


structure. Initially, all the elements of each candidate set (or, equivalently, of each parti- 
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Input: C(i, 6) for 7 € {1,...,n}, 6€ B, and 21,...,2n 
Output: a useful experiment a 
Procedure: 
for i € {1,...,n}: Place 7 in an equivalence class by itself. 
for b€ B,s € Ss 
Let j be an arbitrary member of s. 
J — FIND(j) 
fork €s— {7} 
K — FIND(k) 
if J # K then 
J — UNION(J, K) 
enqueue ({j, k}, b) 
while queue not empty do 
dequeue ({j,k},o) 
ifz; A x, then return o 
forb€B 
let j’ be an arbitrary member of C(j, 6) 
let k’ be an arbitrary member of C(k, 6) 
J — FIND(j’), K — FIND(k’) 
if J #4 K then 
UNION(J, K) 
enqueue ({j’, k’}, bc) 
return FAIL 


Figure 4.3: The Experiment Planning Subroutine PLAN-EXP 


tion block) are merged into the same equivalence class. To merge a pair {j,k}, we check 
that the two are in the same equivalence class; if they are not, their equivalence classes are 
UNIONed and the pair is placed on a queue. Thus, a UNION operation is always coupled 
with an addition to the queue. When the pair {j,k} is dequeued, the members of C'(j,}) 
are merged with those of C’(k, 6) for all the basic actions b, and the process continues. 

The subroutine is constructed so that if ({j,k},o) is on the queue, then j,k € C(i,c) 
for some 7. Thus, if x; # 24, then o is a plausible experiment. 

During the execution of the subroutine, if ({j,k},o) was the last pair enqueued, then 
the current search depth is defined to be |o|. It is clear that the search depth increases 
incrementally. 


The next theorem is useful in analyzing and seeing the correctness of the subroutine. 


Theorem 16 Suppose j,k € C(i,a). Then the subroutine of Figure 4.3 (if not interrupted 
to return an answer) will merge j and k into the same equivalence class before the search 


depth exceeds |c|. 
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Proof: By induction on |o|. 

If |o| = 1, then j,k € C(i,b) for some 6 € B, and j and k are merged into the same 
equivalence class during the initialization phase when the search depth is exactly one. 

Let h > 1 and suppose that the theorem’s statement holds when |o| < A. Given 
j,k € C(i,o), where |o| = h, we wish to show that j and k are merged before the search 
depth exceeds h. 

Let o = ba,b € B,a € A and let r,s be such that r,s € C(i,a) and j € C(r,b),k € 
C(s,6). Since |a| = h — 1, r and s have been merged by the time the search depth reaches 
h, by our inductive hypothesis. Thus, there must have been a series of UNION operations 
performed to bring this about. Since each UNION operation is coupled with an addition to 


the queue, there must have been a series of enqueuings of the form: 


({r=To,T1} 5 9%) 
({ri,r2} , O41) 
({r2,73} , 92) 


({?ms%m+1 = s} ’ Om): 
When ({rz,rz+1},%z) is dequeued, the members of the candidate sets C(r,,b) and 
C(rz41,6) are merged into one equivalence class, so that, transitively, the sets C(r,b) and 
C'(s,6) are merged into one. In particular, 7 and k’s equivalence classes are merged. Since 


each |o,| < h, this happens before the search depth exceeds h. 


Corollary 2 The first plausible experiment discovered by the subroutine (i.e. the one re- 


turned) will also be the shortest plausible experiment. 


Corollary 3 If there exists a plausible experiment, then the subroutine will discover it. 


That is, a return of FAIL by the procedure will be correct. 


Clearly, the running time of the procedure is bounded by the number of UNION-FIND 
operations. Since we begin with n equivalence classes, no more than n UNIONs can be 
performed. Therefore, n bounds the total number of enqueuings, and so the search depth 
as well. Based on this fact and the fact that S; is a partition of at most n elements, we see 
that O(|B|n) FIND operations are performed, yielding a running time for the subroutine of 
O(|B|n - a(|B|n)), where a@ is an extremely slow growing functional inverse of Ackerman’s 
function. (See {17].) Finally, the length of the experiment constructed cannot exceed the 


maximum search depth of n. Thus, we have: 


Theorem 17 Our inference algorithm correctly infers the environment E in time 


O(|B|?n?a(|Bln)), having executed no more than |B\|n? basic actions. 
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4.4 Optimality 


In this section, we prove that the upper bound on the number of basic actions executed by 


our inference algorithm is (within a constant factor of) the best possible. 


Theorem 18 There exists a constant « > 0 such that, for all n > 4,m > 3, there exists 
a simple assignment automaton E for which |B| = m and |V| = n, and which cannot be 


inferred by any algorithm which executes fewer than €|B\|n? basic actions. 


Proof: Consider the following “combination lock” environment €, similar to the example 
described in Section 4.2: n = |V| > 4,|B| > 3. B contains a special “clear” symbol c. 
The “lock’s combination” is the sequence a1a2...@,_2 where a, = c and a; € B — {c} for 


1<i<n-—1. The update function 6 is defined as follows: 
e 6(1,b)=1lforbeB 
e 6(n,b)=nforbe B 
e 6(i,a;-1)=i-—lforl<i<n 
e 6(t,b)=nforl<i<n,be B-— {a,_3}. 


Initially, only z, is true. 

It is easy to verify that x, is always true, rz, is always false, and no more than one 
variable at a time (other than 21) can be true. If 1 < i < n, the variable z; will be true if 
and only if the action sequence a,42...a;-, was just executed. 

Consider the set P of pairs (i,b) where 2 <i < n,b € B — {c} and 4(i,b) = n (ie, 
b # a;_1). To positively identify €, an inference algorithm must, for each such pair in P, 
eliminate the possibility that 6(1,6) = 7-1. It is not hard to see that the only experiment 
which will do this is the sequence 0; = ca2a3...a;_26. Let E = {o;, | (t,) € P}. Clearly, 
|E| = |P|. At some time, each experiment in E must be executed; however, no two of these 
experiments can overlap by our construction. Thus, the number of basic actions executed 


must be at least 


de lel= do (1B) - 2) 1) = Q(/BIn’). 


cCE 2<i<n 
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Chapter 5 
Conclusions and Open Problems 


We have presented a new representation for finite-state systems (environments), and pro- 
posed a new procedure for inferring a finite state environment from its input/output be- 
havior. 

In the case of permutation environments, our procedure can infer the structure of the 
environment in expected time polynomial in the diversity of the environment, and log(+), 
where ¢ is an arbitrary positive upper bound given on the probability that our procedure 
will return an incorrect result. 

For general environments, our procedure appears to work well in practice, although we 
don’t have a proof to this effect. 

When the environment has lots of “structure”, the diversity will typically be many 
orders of magnitude smaller than the number of global states of the environment; in these 
cases our procedure can offer many orders of magnitude improvement in running time over 
previous methods. 

Finally, we have shown how to infer any visible simple assignment automaton in time 
polynomial in the number of variables and basic actions in that automaton, and have shown 
that our procedure is optimal to within a constant factor in terms of the number of basic 
actions executed. 

Future work should be directed toward methods of handling, or handling better, a 
broader class of environments. Environments apparently not handled well by our current 


techniques include those with: 


e Actions with conditional effects (such as a Grid World with boundaries, so that the 


“step ahead” action has no effect if the robot is facing and up against the boundary). 


e Dependence on global state variables or control variables (e.g. an “on-off switch in 
the Car Radio World). 


e States which are difficult to reach (consider the “combination lock” environment of 
Chapter 4 which is almost always in a locked state, and is unlikely to be unlocked by 


trying random combinations). 
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e Actions with probabilistic effects (such as a “spin” operator in the Grid World, which 


leaves the robot facing in a random direction). 


e Actions or sensations which are subject to noise, and so may have unreliable effects 


or be providing unreliable information. 
e Environments which are infinitely large (such as an infinitely long Register World). 


The question of how to apply the planning techniques of the last chapter to the general 
problem of inferring automata with hidden variables remains open. Also open is the question 
of what other classes of automata can be inferred by techniques similar to those used for 
inference of permutation environments. Finally, what other models of learning (such as 
mistake bound learning as in [13]) can be applied to the problem of inference of finite 


automata? 
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